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(54) Computer system including a bus bridge implementing adaptive speculative read operations 



(57) A computer system includes a microprocessor 
coupled to a main memory through a bridge logic unit. 
The bridge logic unit receives, memory read requests 
from the microprocessor and provides the requests to 
the main memory. The bridge logic unit includes a mem- 
ory fetch control unit configured to fetch a single line of 
data from the main memory in response to an initial read 
request from the microprocessor. If a read request to a 
sequential line of data is received from the microproc- 
essor, the memory fetch control unit fetches not only the 
requested line of data but also the next sequential line 



of data. Thus, following the initial read request in which 
a single line of data is fetched, when the microprocessor 
issues a request for data from a sequential line, that line 
is fetched and the subsequent line is speculatively 
prefetched. If the microprocessor continues with a re- 
quest to yet an additional sequential line, the memory 
fetch unit continues its speculative generation of a re- 
quest for the next sequential line. II the microprocessor 
issues a memory read request to a non-sequential line 
of data, the memory fetch control unit fetches only that 
line of data. 
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Description 

[0001] This invention relates to computer systems 
and, more particularly, to integrated bus bridge designs 
for use in high performance computer systems. The in- 
vention also relates to prefetching mechanisms em- 
ployed in computer systems. 

[0002] Computer architectures generally include a 
plurality of devices interconnected by one or more bus- 
es. For example, conventional computer systems typi- 
cally include a CPU coupled through bridge logic to an 
external main memory. A main memory controller is thus 
typically incorporated within the bridge logic to generate 
various control signals for accessing the main memory. 
An interface to a high bandwidth local expansion bus, 
such as the Peripheral Component Interconnect (PCI) 
-bus, may also be included as a portion of the bridge log- 
ic. Examples of devices which can be coupled to the lo- 
cal expansion bus include network interface cards, vid- 
eo accelerators, audio cards, SCSI adapters, telephony 
cards, etc. An older-style expansion bus may be sup- 
ported through yet an additional bus interface to provide 
compatibility with earlier-version expansion bus adapt- 
ers. Examples of such expansion buses include the In- 
dustry Standard Architecture (ISA) bus, also referred to 
as the AT bus, the Extended Industry Standard Archi- 
tecture (EISA) bus, and the MicroChannel Architecture 
(MCA) bus. Various devices may be coupled to this sec- 
ond expansion bus, including a fax/modem card, sound 
card, etc. 

[0003] The bridge logic can link or interface more than 
simply the CPU bus, a peripheral bus such as a PCI bus, 
and the memory bus. In applications that are graphics 
intensive, a separate peripheral bus optimized for 
graphics related transfers may be supported by the 
bridge logic. A popular example of such a bus is the AGP 
(Advanced Graphics Port) bus. AGP is generally con- 
sidered a high performance, component level intercon- 
nect optimized for three dimensional graphical display 
applications, and is based on a set of performance ex- 
tensions or enhancements to PCI. AGP came about, in 
part, from the increasing demands placed on memory 
bandwidths for three dimensional renderings. AGP pro- 
vided an order of magnitude bandwidth improvement for 
data transfers between a graphics accelerator and sys- 
tem memory. This allowed some of the three dimension- 
al rendering data structures to be effectively shifted into 
main memory, relieving the costs of incorporating large 
amounts of memory local to the graphics accelerator or 
frame buffer. 

[0004] AGP uses the PCI specification as an opera- 
tional baseline, yet provides three significant perform- 
ance extensions or enhancements to that specification. 
These extensions include a deeply pipelined read and 
write operation, demultiplexing of address and data on 
the AGP bus, and ac timing specifications for faster data 
transfer rates. 

[0005] Since computer systems were originally devel- 



oped for business applications including word process- 
ing and spreadsheets, among others, the bridge logic 
within such systems was generally optimized to provide 
the CPU with relatively good performance with respect 

5 to its access to main memory. The bridge logic generally 
provided relatively poor performance, however, with re- 
spect to main memory accesses by other devices resid- 
ing on peripheral buses, and similarly provided relatively 
poor performance with respect to data transfers be- 

10 tween the CPU and peripheral buses as well as between 
peripheral devices interconnected through the bridge 
logic. 

[0006] Recently, however, computer systems have 
been increasingly utilized in the processing of various 

is real time applications, including multimedia applications 
such as video and audio, telephony, and speech recog- 
nition. These systems require not only that the CPU 
have adequate access to the main memory, but also that 
devices residing on various peripheral buses such as 

20 an AGP bus and a PCI bus have fair access to the main 
memory. Furthermore, it is often important that transac- 
tions between the CPU, the AGP bus and the PCI bus 
be efficiently handled. The bus bridge logic for a modern 
computer system should accordingly include mecha- 

25 nisms to efficiently prioritize and arbitrate among the 
varying requests of devices seeking access to main 
memory and to other system components coupled 
through the bridge logic. 

[0007] The bus bridge logic of a computer system may 

30 also be configured to speculatively fetch data in antici- 
pation that the CPU or a peripheral device will request 
sequential data with respect to a current request. For 
example, in some systems when the CPU requests data 
residing at a particular address, speculative read logic 

35 incor n orst9d in the bus brid n e design wi!! f lino 
of data containing the requested data and additionally 
request a sequential line of data. If the processor then 
requests further sequential lines of data, the speculative 
read logic continues to prefetch the next line of data. As 

to long as the processor continues to read sequential data, 
this will result in the number of requested reads by the 
speculative read logic to follow a pattern of 2-1-1-1. In 
this manner, the fetch of data occurs one line ahead of 
the actual data needed. Performance may thereby be 
enhanced by reducing latencies associated with mem- 
ory accesses, as long as the requests are sequential. 
However, since the CPU may not consistently request 
data from sequential lines, the speculative read logic 
may establish a read pattern of 2-2-2-2 during non-se- 

so quential access patterns. When such accesses occur, 
the second read (i.e., the speculative read data) in each 
case is thrown away, resulting in wasted memory band- 
width. Accordingly, the overall performance of the sys- 
tem may be degraded. 

55 [0008] The problems outlined above are in large part 
solved by a computer system including a bus bridge ap- 
paratus having speculative read logic in • accordance 
with the present invention. In one embodiment, a com- 
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puter system includes a microprocessor coupled to a 
main memory through a bridge logic unit. The bridge log- 
ic unit receives memory read requests from the micro- 
processor and provides the requests to the main mem- 
ory. The bridge logic unit includes a memory fetch con- 
trol unit configured to fetch a single line of data from the 
main memory in response to an initial read request from 
the microprocessor. If a read request to a sequential line 
of data is received from the microprocessor, the memory 
fetch control unit fetches not only the requested line of 
data but also the next sequential line. of data. Thus, fol- 
lowing the initial read request in which a single line of 
data is fetched, when the microprocessor issues a re- 
quest for data from a sequential line, that line is fetched 
and the subsequent line is speculatively prefetched. If 
the microprocessor continues with a request to yet an 
additional sequential line, the memory fetch unit contin- 
ues its speculative generation of a request for the next 
sequential line. If the microprocessor issues a memory 
read request to a non-sequential line of data, the mem- 
ory fetch control unit fetches only that line of data. 
[0009] The memory fetch control unit accordingly im- 
plements an adaptive specu lative read algorithm where- 
in speculative read data is prefetched only if a sequential 
request is received. During non-sequential access pat- 
terns, the number of read requests will follow a pattern 
of 1 -1 -1 -1 . Thus, inaccurate speculative fetching of data 
may be prevented. On the other hand, when sequential 
requests are repetitively made, a read request pattern 
of i_2-1 -1-1 results, wherein speculative data is contin- 
uously fetched upon detection of the first sequential ac- 
cess until the sequential accesses terminate. Better hit 
rates and efficiency may thereby be attained, and mem- 
ory bandwidth may be conserved. 
[0010] Other objects and advantages of the invention 
will become apparent upon reading the following de- 
tailed description and upon reference to the accompa- 
nying drawings in which: 

[0011] Fig. 1 is a block diagram ofa computer system 
including an integrated bridge logic unit. 
[0012] Fig. 2 is a block diagram of one embodiment 
of a bridge logic unit. 

[001 3] Fig. 3 is a block diagram of one implementation 
of a CPU interlace. 

[0014] Fig. 4A is a block diagram illustrating aspects 
of a suitable embodiment of a PCI interface. 
[0015] Fig. 4B is a block diagram of an implementa- 
tion of a PCI master transient read buffer employed with- 
in a PCI interface master control unit. 
[0016] Fig. 4C is a block diagram of an implementa- 
tion of a PCI master transient write buffer employed 
within a PCI interface master control unit. 
[0017] Fig. 4D is a diagram illustrating aspects of an 
exemplary implementation of a PCI slave transient read 
buffer. 

[0018] Fig. 5 is a block diagram of one embodiment 
of an AGP interface. 

[0019] Fig 6A is a block diagram of one embodiment 
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of memory queue manager. 

[0020] Fig. 6B is a diagram illustrating various aspects 
associated with an exemplary implementation of a write 
request queue, along with related aspects of a write re- 

5 quest queue snoop logic unit. 

[0021] Fig. 7 is a block diagram of one embodiment 
of a non-local memory (PCI/AGP) queue manager. 
[0022] Fig. 8 is a block diagram illustrating further de- 
tails of one embodiment of a CPU interface including a 
to fetch mechanism for implementing adaptive speculative 
read operations. 

[0023] Fig. 9 is a flow diagram illustrating aspects of 
one embodiment of an adaptive speculative read algo- 
rithm. 

is [0024] While the invention is susceptible to various 
modifications and alternative forms, specific embodi- 
ments thereof are shown by way of example in the draw- 
ings and will herein be described in detail. It should be 
understood, however, that the drawings and detailed de- 

20 scription thereto are not intended to limit the invention 
to the particular form disclosed, but on the contrary, the 
intention is to cover all modifications, equivalents and 
alternatives falling within the spirit and scope of the 
present invention as defined by the appended claims. 

25 [0025] Turning now to the drawings, Fig. 1 is a block 
diagram of a computer system 100 including a CPU 
(Central Processing Unit) 101 coupled to a variety of 
system components through an integrated bridge logic 
unit 102. In the depicted system, a main memory 104 is 

30 coupled to bridge logic unit 102 through a memory bus 
106, and a graphics controller 108 is coupled to bridge 
logic unit 102 through an AGP bus 110. Finally, a plural- 
ity of PCI devices 112 are coupled to bridge logic unit 
102 through a PCI bus 114. A secondary bridge logic 

35 unit 116 may further be provided to accommodate an 
electrical interface to one or more EISA or ISA devices 
118 through an EISA/ISA bus 120. 
[0026] In addition to providing an interface to an ISA/ 
EISA bus, secondary bridge logic unit 116 may further 

to incorporate additional functionality, as desired. For ex- 
ample, in one embodiment, secondary bridge logic unit 
116 includes a master PCI arbiter (not shown) for arbi- 
trating ownership of PCI bus 1 1 4. Secondary bridge log- 
ic unit 1 1 6 may additionally incorporate a disk drive con- 

45 troller, an interrupt controller, and power management 
support functionality. An input/output controller (not 
shown), either external from or integrated with second- 
ary bridge logic unit -116, may also be included within 
computer system 100 to provide operational support for 

50 a keyboard and mouse 130 and for various serial and 
parallel ports, as desired. 

[0027] CPU 101 is illustrative of, for example, a Pen- 
tium® Pro microprocessor. It is understood, however, 
that in other embodiments of computer system 100, al- 
55 ternative types of microprocessors could be employed. 
An external cache unit (not shown) may further be cou- 
pled to CPU bus 103 in other embodiments. 
[0028] Main memory 104 is a memory in which appli- 
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cation programs are stored and from which CPU 101 
primarily executes out of. A suitable main memory 104 
comprises DRAM (Dynamic Random Access Memory), 
and preferably a plurality of banks of SDRAM (Synchro- 
nous DRAM). 

[0029] PCI devices 112 are illustrative of a variety of 
peripheral devices such as : for example, network inter- 
face cards, video accelerators, audio cards, hard or flop- 
py disk drives, SCSI (Small Computer Systems Inter- 
face) adapters and telephony cards. Similarly, ISA de- 
vice 118 is illustrative of various types of peripheral de- 
vices, such as a modem. 

[0030] Graphics controller 108 is provided to control 
the rendering of text and images on a display 135. 
Graphics controller 108 may embody atypical graphics 
accelerator generally known in the art to render three- 
dimensional data structures which can be effectively 
shifted into and from main memory 104. Graphics con- 
troller 108 may therefore be a master of AGP bus 110 
in thai it can request and receive access lo a target in- 
terface within bridge logic unit 102 to thereby obtain ac- 
cess to main memory 1 04. A dedicated graphics bus ac- 
commodates rapid retrieval of data from main memory 
1 04. For certain operations, graphics controller 1 08 may 
further be configured to generate PCI protocol transac- 
tions on AGP bus 1 1 0. The AGP interface of bridge logic 
unit 102 may thus include functionality to support both 
AGP protocol transactions as well as PCI protocol target 
and initiator transactions. Display 135 is any electronic 
display upon which an image or text can be presented. 
A suitable display 135 includes a cathode ray tube 
(■CRT"), a liquid crystal display ("LCD"), etc. 
[0031] Turning next to Fig. 2, a block diagram ofone 
embodiment ofbridge logic unit 102 is shown. The de- 

ir-r-k orit r~\f KriWriA Unio unit i f"10 in.nlii*>Jfi n 
fiuiwu uinwvuiiiiviii wr vrf i loyi./ ivyio Ul Ml IVC IllUtUVJCO CI 

CPU interface 204 coupled to a memory queue manag- 
er 206 and a PCI/AGP queue manager 208 (also re- 
ferred to as the NLM (non-local memory) manager). A 
memory controller 21 0, a PCI interface 21 2, and an AGP 
interface 214 are further shown coupled to memory 
queue manager 206. The illustrated components of 
bridge logic unit 102 may be embodied upon a single 
monolithic integrated circuit chip 
[0032] As will described in further detail below, all re- 
quests to main memory 104, both read and writes, are 
processed through memory queue manager 206. Mem- 
ory queue manager 206 is configured to receive re- 
quests from each of the depicted interfaces, arbitrates 
between them, and appropriately loads each request in- 
to either a read request queue 220 or a write request 
queue 222. Requests from read request queue 220 and 
write request queue 222 are then provided to memory 
controller 210 which subsequently orchestrates the 
transfer of data to or from main memory 104. As illus- 
trated, read data resulting from memory read requests 
may be returned directly to CPU interlace 204 and AGP 
interface 214 from memory controller 210 
[0033] Non-local memory requests from CPU 101 to 



devices coupled to either PCI bus 1 1 4 or AGP bus 1 1 0, 
as well as requests between AGP bus 110 and PCI bus 
114, are processed through PCI/AGP queue manager 
208. Non-local memory requests include interrupt ac- 

5 knowledge, I/O cycles, configuration cycles, special cy- 
cles, and memory cycles to an address range outside 
of the main memory address range. 
[0034] Generally speaking, the bridge logic unite 102 
of Fig. 2 is configured to implement an adaptive specu- 

10 lative read algorithm which optimizes prefetching of read 
data associated with read requests of CPU 101. In one 
embodiment, CPU interface 204 includes a prefetch 
mechanism which fetches a single line of data from 
memory 104 in response to an initial read request by 

is CPU 1 01 . If CPU 1 01 requests a memory read to a se- 
quential line of data, the memory fetch unit requests 
both that requested sequential line of data and the line 
ahead of the requested line. This second, speculatively 
fetched line of data is requested in anticipation that ad- 

20 ditional sequential lines will ultimately be requested by 
CPU 101. When CPU 101 in fact performs another se- 
quential read request and a hit to a speculatively 
prefetched line occurs, the prefetch mechanism specu- 
latively fetches yet an additional sequential line. On the 

2B other hand, if CPU 101 requests a non-sequential line, 
the fetch unit only requests the requested line, and no 
speculative fetches are requested.. 
[0035] The adaptive speculative read algorithm im- 
plemented by bridge logic unit 1 02 may advantageously 

30 result in improved hit rates with respect to the specula- 
tive prefetching of data from memory since it is likely 
that once a speculative access has occurred, subse- 
quent speculative accesses will follow. In situations 
where non-sequential access patterns are prevalent, 

~- only the needed data is typically fetched arid metnufy 
bandwidth is not wasted due to unneeded speculatively 
prefetched data. On the other hand, upon situations 
when CPU 101 is performing generally sequential ac- 
cess patterns, improved performance may be attained 

40 by reducing memory latency. Further details regarding 
the adaptive speculative read algorithm employed by 
bridge logic unit 102 are provided below in conjunction 
with Figs. 3 and 9. 

[0036] Further aspects regarding a suitable imple- 
45 mentation ofthe various blocks illustrated in Fig. 2 will 
next be discussed. Referring to Fig. 3. a block diagram 
is shown of one embodiment of CPU interface 204. Gen- 
erally speaking, CPU interface 204 operates as a target 
with respect to various transactions effectuated by CPU 
50 101. In the illustrated embodiment, CPU interlace 204 
includes a CPU bus interface control unit 302 coupled 
to an in-order queue 304 and to a read back buffer 306. 
A CPU to memory transient buffer 308 and a CPU to 
NLM transient buffer 310 are further illustratively cou- 
55 pled to CPU bus interface control unit 302. 

[0037] CPU bus interface control unit 302 is provided 
to detect and track cycles being effectuated upon CPU 
bus 1 03. In one embodiment in which CPU 101 is a Pen- 
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tium® Pro microprocessor, CPU bus interlace control 
unit 302 includes separate state machines for request 
phase decoding, snoop tracking, response tracking and 
data tracking. Since the Pentium® Pro microprocessor 
allows multiple outstanding requests to be pipelined, 
CPU bus interface control unit 302 may be configured 
to track multiple cycles concurrently. In one embodi- 
ment, up to four CPU bus cycles may be simultaneously 
active. 

[0038] As cycles are effectuated, requests from CPU 
101 are loaded in order within in-order queue 304. 
These requests may comprise read or write requests for 
access to main memory 1 04, and read or write requests 
to non-local memory including I/O requests. It is noted 
that various other request types may further be accom- 
modated, such as various special cycles including flush 
cycles, interrupt acknowledge cycles, etc. depending 
upon the specific microprocessor employed in the im- 
plementation and the system requirements. In one em- 
bodiment, up to four requests may be pending within in- 
order queue 304 (corresponding to the up to four out- 
standing transactions that may be pending on CPU bus 
103). The removal or retiring of requests within in-order 
queue 304 is performed when a particular transaction is 
completed on CPU bus 103. 

[0039] CPU bus interface control unit 302 is further 
configured to de-queue requests from in-order queue 
304 and to decode the CPU cycles. CPU bus interface 
unit 302 determines ifthe CPU request is for access to 
main memory 104, the GART (Graphics Adapter Remap 
Table) region, AGP bus 110 or PCI bus 114. Further- 
more, CPU bus interlace control unit 302 may determine 
if the transaction can be accepted, posted, or if it has to 
be retried. 

[0040] Several buffers may be incorporated within 
CPU interface 204. CPU to memory transient buffer 308 
interlaces to memory queue manager 206, and in one 
implementation is two cache lines deep. CPU to non- 
local memory (NLM) transient buffer 310 interfaces to 
the PCI/AGP queue manager 208. In one implementa- 
tion, CPU to NLM transient buffer 310 is also two cache 
lines deep. These buffers provide a simple mechanism 
for the CPU interface 204 to communicate to other mod- 
ules of the bridge logic unit 102 for read, write and other 
miscellaneous requests. 

[0041] CPU to memory transient buffer 308 provides 
an area where memory requests can be stored until they 
can be serviced by memory queue manager 206. Since 
CPU to memory transient buffer 308 may be two lines 
deep, memory queue manager 206 may read one loca- 
tion while another request is being loaded into the other 
location via in-order queue 304. The request information 
contained by CPU to memory transient buffer 308 in- 
cludes a request address, request type information, and 
write data (for write requests only). In one embodiment, 
memory queue manager 206 extracts data 64-bits at a 
time from the data portions residing within CPU to mem- 
ory transient buffer 308. 



[0042] Various transactions from CPU 101 to either 
AGP bus 110 or PCI bus 114 (discussed further below) 
are communicated through CPU to NLM transient buffer 
31 0 to PCI/AGP queue manager 208. In one implemen- 

5 tation, all requests to the PCI/AGP queue manager 208 
are quad word (i.e., 64-bits) based only. Cache line 
writes from CPU 101 occupy four locations in the data 
portions of the CPU to NLM transient buffer, but only one 
address. An individual request to the PCI/AGP queue 

J0 manager 208 is generated for each of the quadwords, 
wherein the stored address is incremented by one after 
each request. 

[0043] In one implementation, CPU to memory tran- 
sient buffer 308 may always request a full cache line of 
75 data from main memory 104, even if the actual request 
is a single quadword read. On the other hand, the CPU 
to NLM transient buffer 310 only requests a quadword 
of data at a time. 

[0044] A feedback path for data read from main mem- 

20 ory 104 is provided through read back buffer 306. A by- 
pass path 307 may further be provided to allow data to 
bypass the read back buffer 306 and be directly driven 
upon CPU bus 103. Furthermore, read data from PCI/ 
AGP queue manager 208 is provided upon a path 312. 

2S [0045] CPU interface 204 may be configured such 
that certain write cycles are always posted, and such 
that other cycles are never posted. Similarly, certain 
types of read cycles may result in snoop stalls, while 
others will not. For example, in one implementation I/O 

30 cycles are never posted, while memory cycles to main 
memory 1 04 as well as to non-local memory are always 
posted. I/O reads and non-local memory reads may re- 
sult in snoop stalls until data is ready since the cycle 
may need to be retried under certain circumstances, as 

35 discussed further below. On the other hand, reads to 
main memory may not result in snoop stalls; rather, CPU 
bus interface control unit 302 may simply withhold as- 
sertion of the DRDY signal until the requested data is 
available in read back buffer 306. It is noted that CPU 

40 to memory transient buffer 308 and CPU to NLM tran- 
sient buffer 31 0 function as a write posting buffer to allow 
address and data from CPU 1 01 to be accumulated until 
the appropriate queue manager can service the re- 
quests, and also function as read request buffers where 

45 multiple read cycles can be outstanding. 

[0046] A snoop control unit 316 is finally illustrated 
within CPU interface 204. Snoop control unit 316 is con- 
figured to generate snoop transactions on CPU bus 1 03 
to ensure memory coherency during PCI cycles to main 

50 memory 104. In certain situations where a writeback of 
modified data from CPU 101 (or an external cache unit) 
occurs, snoop control unit 316 may merge the line 
ofwritcback data with the write data to memory from the 
PCI bus 114. Writeback data may further be snarled in 

55 response to a PCI memory read operation to allow the 
writeback data to be directly provided to PCI bus 114 
through PCI interface 216. 

[0047] Turning next to Fig. 4A, a block diagram illus- 
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trating aspects of one suitable embodiment of PCI inter- 
face 21 6 is shown. PCI interlace 21 6 generally includes 
a PCI interface master control unit 402 coupled between 
PCI bus 1 1 4 and PCI/AGP queue manager 208. PCI in- 
terface master control unit 402 is configured to initiate 
transactions on PCI bus 114 on behalf of CPU initiated 
transactions or AGP write transactions targeted to PCI 
bus 114. As stated previously, CPU and AGP initiated 
transactions targeted to PCI bus 114 communicate to 
the PCI interface 21 6 through PCI/AGP queue manager 
208. When a request to read or write data to PCI bus 
114 is received by PCI interface master control unit 402, 
PCI interface master control unit 402 arbitrates for the 
PCI bus 114 and initiates a transaction on PCI bus 114. 
Address, byte enable, transaction type description, and 
data (for write transactions) are passed from the PCI/ 
AGP queue manager 208 to the PCI interface master 
control unit 402 to accommodate effectuation of the 
proper cycle on PCI bus 114. 

[0048] The transfer of requests from PCI/AGP queue 
manager 208 to PCI interface 216 may be based on 
quadword transfers. Cache line transfers are trans- 
ferred as four separate quadwords. Byte enables are 
further passed to the PCI interface master control unit 
402 and arc utilized to ultimately decide the size of a 
data transfer on PCI bus 1 1 4. PCI interface master con- 
trol unit 402 may multiplex either the lower or upper four 
byte enables to PCI bus 114 depending on the asserted 
byte enables If all the byte enables are asserted, PCI 
interface master control unit 402 may convert the quad- 
word transfer into a burst of two doublewords on PCI 
bus 114 (since the PCI bus has a data width of 32-bits). 
If either the four upper or four lower byte enables are 
deasserted, the PCI interface master control unit 402 
may drive the request from PCKAGP queue manager 
208 as a single doubleword transfer on PCI bus 114. It 
is noted that PCI interface master control unit 402 may 
further support write combining of sequential write data 
from CPU bus 103 or AGP bus 110. 
[0049] PCI/AGP queue manager 203 and PCI inter- 
face master control unit 402 may employ a simple re- 
quest/acknowledge protocol to control the flow oftrans- 
actions between the two interfaces. Separate request 
and acknowledge signals may further be employed to 
control the transfer of data between the AGP interface 
214 and PCI interlace 216. 

[0050] Fig. 4B is a block diagram of an implementa- 
tion of a PCI master transient read buffer employed with- 
in PCI interface master control unit 402. As illustrated, 
read data from the multiplexed address/data lines 422 
of PCI bus 114 are provided to a pair of multiplexers 424 
and 426. Depending upon the 64-bit quadword to which 
the read data aligns, the data is latched on a given clock 
within either latch 423 or 430. In this manner, 32-bit dou- 
bleword information from PCI bus 114 is quadword 
aligned lor receipt by CPU interface 204. 
[0051] Fig. 4C illustrates a block diagram of an imple- 
mentation of a PCI master transient write buffer which 



may be employed within PCI interface master control 
unit 402. Similar to the PCI master transient read buffer, 
the PCI master transient write buffer of Fig. 4C selects 
either the upper doubleword or the lower doubleword of 

5 write data from PC l/AGP queue manager 208 to be driv- 
en upon the multiplexed address/data lines 422 of PCI 
bus 114. In the depicted implementation, 64-bit data is 
stored on a given clock within flip-flops 440 and 442 
through multiplexers 444 and 446, respectively. The ap- 

io propriate doubleword of data being written is then se- 
lected through multiplexer 448 and through multiplexer 
450 to be driven upon PCI bus 1 1 4 through flip-flop 452. 
It is noted that address information may be selected 
through multiplexer 450 to be driven on the multiplexed 

'5 address/data lines 422 of PCI bus 114 during the ad- 
dress phases of PCI transactions, and that read data, 
when PCI interlace 21 6 is operating as a slave, may sim- 
ilarly be selected through multiplexer 450 during slave- 
mode read cycles, as discussed further below. 

20 [0052] Turning back to Fig. 4A, PCI interface 216 fur- 
ther includes a slave interlace 41 0 which accepts trans- 
actions targeted for main memory 104, the PCI config- 
uration address base within bus bridge unit 102, mem- 
ory writes targeted toward AGP bus 110, and cycles to 

25 the memory mapped AGP control registers. Slave inter- 
face 410 illustratively includes a PCI interface control 
unit 412 coupled to a PCI slave address buffer 414, a 
PCI slave transient read buffer 416, and a PCI slave 
transient write buffer 418 

30 [0053] When the FRAME_ signal is asserted on PCI 
bus 1 1 4, indicating the start ofa PCI transaction, the ad- 
dress of the transaction is stored within PCI slave ad- 
dress buffer 414. PCI interface slave control unit 412 
further receives command information from PCI bus 114 

35 inHiratinn tho h/na nf /-*\/<~lo hiainn offr>/Mi iot^H Th^ Dri 
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interface slave control unit 412 is configured to decode 
the command and address information to determine if 
the transaction is targeted to bus bridge unit 102 and, 
asserts the DEVSEL_signal to claim the cycle, when ap- 

40 propriate. As each address is stored in PCI slave ad- 
dress buffer 414, the PCI address will be decoded to 
determine when graphics address translation is re- 
quired. If the PCI address is within the bounds of the 
virtual Graphics Address Range defined by the GART 

4$ (Graphics Adaptor Remap Table) mechanism (not 
shown), the PCI slave interface (410) indicates to the 
memory Queue Manager 206 that address translation 
is required for this request. 

[0054] If the PCI transaction is targeted for main mem- 
50 ory 104, slave interface 410 will either provide data for 
read transactions, begin accepting data for write trans- 
actions, or retry the PCI bus transaction. For PCI mem- 
ory read transactions, the PCI slave interface performs 
PCI "delayed read" transactions. During a PCI delayed 
55 read transaction, the slave interface 410 requests the 
read data by providing a request to memory queue man- 
ager 206 and retries (e.g. . through the PCI STOP signal) 
the PCI read transaction until data has been returned 
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from memory queue manager 206. For PCI memory 
write transactions, data is accepted into the PCI slave 
transient write buffer 418 once the PCI transaction has 
been positively decoded. A corresponding request in- 
cluding the valid PCI write data is subsequently provided 
to the memory queue manager 206 when either a full 
cache line has been accepted into the PCI slave tran- 
sient write buffer 418 or the PCI bus transaction ends. 
PCI interface slave control unit 41 2 may additionally pro- 
vide a snoop request to memory queue manager 206 
with each PCI master access to a new cache line in main 
memory 1 04. This snoop request is asserted to maintain 
cache coherency. 

[0055] Turning next to Fig. 4D, a diagram illustrating 
aspects of an exemplary implementation of PCI slave 
transient read buffer 41 6 is shown. For the implementa- 
tion of Fig. 4D, PCI slave transient read butter 416 in- 
cludes a 1 6-by-32 bit read buffer for accepting up to two 
cache lines of read data requested by a PCI master. The 
read buffer is used to accept valid data Irom memory 
queue manager 206 which is sourced from either data 
fetched from main memory 104 or from CPU writeback 
data that resulted from a snoop hit to a dirty cache line. 
If a PCI master requests data from main memory 104 
and it is determined that a modified line resides in the 
cache memory upon effectuation of a snoop transaction 
upon CPU bus 103, the memory queue manager 206 
may return data from the CPU writeback transaction be- 
fore the writeback data is written to main memory 104 
If a PCI master requests data from main memory 104 
and the cache line is clean, memory queue manager 
206 returns data fetched from main memory 104 In one 
implementation, an entire cache line of data is always 
requested from memory queue manager 206 regardless 
of the PCI read command type (i.e., memory read, mem- 
ory read multiple, or memory read line). 
[0056] As illustrated by Fig. 4D, PCI slave transient 
read buffer 416 aligns read data with a cache line bound- 
ary. This alignment is supported by a set of multiplexers 
460A-460H. Therefore, data is always returned from 
memory in a linear fashion and will update eight entries 
in PCI slave transient read buffer 416. As quadwords 
are provided from memory queue manager 206, they 
are routed through multiplexers 460A-460H to a corre- 
sponding pair of 32-bit registers {i.e., register pairs 
462A-462H) which correspond to respective quadword 
positions in a given pair of lines. Since there are a total 
of sixteen 32-bit storage registers within the transient 
read buffer, up to two cache lines of read data may be 
stored. This advantageously allows PCI interface slave 
control unit 412 to prefetch data in anticipation of a PCI 
master crossing a cache line boundary, while providing 
data from a current line to PCI bus 114. It is noted that 
selected 32-bit data from one of register pairs 462A- 
462H requested during a particular PCI read transaction 
may be selected and provided through a multiplexer 464 
and passed through multiplexer 450 and flip-flop 452 to 
the multiplexed address/data lines 422 of PCI bus 114 



[0057] Referring back to Fig. 4A, when a PCI memory 
read is targeted for main memory 104, PCI interface 
slave control unit 412 checks the contents of PCI slave 
transient read buffer 41 6 for valid read data. If valid read 
5 data corresponding to the request exists in PCI slave 
transient read buffer 466. the data is provided to PCI 
bus 1 1 4 during the data phase of the PCI transaction. If 
valid read data does not exist in PCI slave transient read 
buffer 416, PCI interface slave control unit 41 2 normally 
io causes the PCI memory read transaction to be retried 
(e.g., using the PCI STOP_signal). PCI interface slave 
control unit 41 2 further requests a cache line containing 
the read data from memory queue manager 206 if a read 
request (either speculatively generated or from a previ- 
is ously retried (delayed) transaction) is not already out- 
standing within the memory queue manager 206 or if 
valid read data from a previously retried (delayed) trans- 
action is not present in the PCI slave transient read buff- 
er 416. Subsequent attempts to read the same data by 
20 ihe PCI master will again result in PCI interface slave 
conttol unit 412 to retry the transaction if the data is still 
not available in the PCI slave transient read buffer 416 
(or instead ifthe snoop phase of the snoop cycle corre- 
sponding to the pending delayed read cycle is not yet 
2S complete, as discussed below). If the PCI master reini- 
tiates the read request and the read data has been 
stored in PCI slave transient read buffer 416, the data 
is provided during that PCI read cycle. 
[0058] PCI interface slave control unit 412 may be 
30 configured such that it does not retry the PCI master if 
the read cycle matches a pending delayed read cycle 
and the snoop phase of the snoop cycle is over. Instead, 
during this condition the PCI slave negates TRDY_. until 
the requested data is available. The master may also 
35 hold in wait states during a burst read transfer that spans 
several cache lines if the snoop phase of the snoop cy- 
cle of a speculative read request is over. This may ad- 
vantageously minimize arbitration latencies and opti- 
mize back to back cache line reads. 
40 [0059] In addition, PCI interface slave control unit 41 2 
may not support multiple delayed read transactions con- 
currently. In such an embodiment, any attempt by a sec- 
ond PCI master to read from main memory while a de- 
layed read transaction is pending will be retried until the 
-*5 first PCI master reinitiates its read transaction and com- 
pletes at least one data transfer, if the first PCI master 
reinitiates its read transaction and leaves the data in the 
PCI slave Iransienl read buffer 416, the remaining data 
is marked speculative by PCI interface slave control unit 
so 41 2. PCI interface slave control unit 41 2 asserts a snoop 
request coincident with each cache line read request to 
the memory queue manager 206 to maintain cache co- 
herency. Once the requested read data is returned from 
the memory queue manager 206, a request correspond- 
55 jng to a previously retried (delayed) read transaction is 
accepted and read data is provided to the PCI master. 
[0060] PCI interface slave control unit 412 may still 
further be configured to control the prefetching of data 
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from main memory 104. In one specific implementation, 
when a PCI memory read line command or a PCI mem- 
ory read multiple command is targeted lor main memory 
1 04, the PCI interlace slave control unit 41 2 immediately 
requests two cache lines of data from main memory 104 
through memory queue manager 206. In anticipation of 
the PCI master reading multiple cache lines of data, PCI 
interface slave control unit 412 performs additional 
speculative read requests as space becomes available 
in PCI slave transient read buffer 416. By prefetching 
data from main memory 104, slave interface 410 can 
advantageously overlap a read request to memory 
queue manager 206 with data transfers on PCI bus 114 
to achieve higher data transfer performance. 
[0061] Speculative read data is sequential data in PCI 
stave transient read buffer 416 which was requested 
purely in anticipation ofthe PCI master reading the next 
sequential memory address. When a PCI master termi- 
nates a transaction without reading all of the sequential 
data in PCI slave transient read buffer 416, the remain- 
ing data is marked as residual speculative read data. 
The remaining data in the PCI slave transient read buffer 
416 may not be marked as residua! speculative data if 
the master, during the last read transfer, did not have all 
of the byte enables set. The residual speculative data is 
invalidated in the PCI slave transient read buffer 416 in 
response to various conditions. For example, residual 
speculative data may be invalidated if a PCI master 
memory read line or multiple transaction is attempted to 
a non-sequential memory address, a PCI memory read 
(normal) transaction is attempted to main memory, a 
CPU to PCI write transaction is detected (since a CPU 
to PCI write transaction is considered to be a synchro- 
nization event) or upon a PCI to memory write that hits 
tj-jo two cache line address space where speculative da- 
ta resides. In addition, data residing in PCI slave tran- 
sient read buffer 416 may be marked invalid due to lapse 
of a discard counter employed to discard delayed read 
data (being held in PCI slave transient read buffer 416) 
in the event the master has not repeated a previously 
retried request establishing the delayed read within a 
predetermined period oftime, as controlled by the dis- 
card counter 

[0062] PCI slave transient write buffer 41 8 of stave in- 
terface 410 allows for the posting of up to two cache 
lines of write data from a PCI master. By providing up to 
two cache lines of write data buffering, slave interlace 
410 may advantageously overlap the acceptance of 
write data from PCI bus 1 1 4 with data transfers to mem- 
ory queue manager 206 or to the PCI/AGP queue man- 
ager 208. When valid write data is present on PCI bus 
114 (i.e., IRDY is asserted), the data and byte enables 
are accepted into PCI slave transient write buffer 418. 
[0063] PCI slave transient write buffer 418 operates 
in either a memory queue manager mode or in an NLM 
mode. In the memory queue manager mode, PCI inter- 
face slave control unit 412 may transfer data to the 
memory queue manager 206 one cache line at a time 



regardless of whether the PCI bus transler size is one 
byte or one cache line. The byte enables for bytes not 
transferred on PCI bus 1 1 4 are deasserted when passed 
to the memory queue manager 206. Once a cache line 

s in PCI slave transient write buffer 418 is full, or as soon 
as the PCI master is finished with the write transfer to 
memory, a valid write data request and byte enables are 
provided to memory queue manager 206. 
[0064] In the NLM mode, PCI slave transient write 

to buffer 418 transfers data to the PCI/AGP queue man- 
ager 208 one quadword at a time. Once a cache line in 
the PCI slave transient write buffer 418 is full, or as soon 
as the PCI master is finished with its write transfer (e. 
g., to the AGP bus 110), the request in the PCI slave 

is transient write buffer 418 is transferred to PCI/AGP 
queue manager 208. The transfer of cache lines to the 
PCI/AGP queue manager 208 may be optimized by no- 
tifying the PCI/AGP queue manager 208 that PCI inter- 
face 216 is performing cache line writes. In the cache 

20 line mode, the PCI/AGP queue manager 208 parks on 
the PCI slave interface 410 until the cache line is fully 
transferred. 

[0065] When a PCI memory write is targeted for main 
memory 104, slave interlace 410 immediately begins 

25 accepting write data from PCI bus 114. Slave interface 
410 posts data from PC I bus 114 into PCI slave transient 
write buffer 418 with the assertion of DEVSEL_. Slave 
interface 41 0 may additionally support the posting of se- 
quential burst writes into PCI slave transient write buffer 

30 41 8 at zero wait states. 

[0066] A write request may be asserted to the memory 
queue manager 206 by slave interface 410 when the 
PCI write transaction is initially positively decoded and 
when the PCI master writes to a new cache line during 

/?.5 o hnrej transact Ion indicating th3t the PC! uddrccc 
should be snooped. The memory queue manager 206 
ensures that all snooping has been completed and that 
any modified data in the memory write data queue, CPU 
to memory transient buffer, or the CPU cache is written 

to to main memory before PCI write data is written to main 
memory. Data merging may be employed in situations 
where writeback data is provided from cache memory. 
[0067] When a PCI memory write and invalidate com- 
mand is targeted for main memory 104, the PCI slave 

4 $ interface 410 treats the command similar to the PCI 
memory write command; however, PCI interface slave 
control unit 412 may be configured to provide a write- 
back and invalidate indication to memory queue man- 
ager 206 coincident with the write request. The CPU in- 

50 terface 204 and memory queue manager 206 can then 
use this condition to ignore the writeback data from CPU 
101 on a hit to a dirty cache line. 
[0068] Finally, PCI interface slave control unit 412 
may be configured to provide a control signal to CPU 

55 interface 204 through PCI/AGP queue manager 208 to 
enable or disable CPU to PCI write posting. This control 
signal may advantageously allow the PCI interface 216 
to prevent data coherency and latency problems. In one 
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suitable implementation, CPU to PCI write posting (in 
CPU to NLM transient buffer 310) is disabled when a 
PCI master establishes a delayed read from main mem- 
ory, and remains disabled until the snoop phase ot the 
snoop cycle completes on CPU bus 103 and the CPU 
to PCI posting buffer is sampled empty. Write posting 
may further be disabled when the flush request signal 
FLSHREQ_ is asserted on PCI bus 114. 
[0069] Referring back to Fig. 2, memory controller 210 
is next considered in further detail. Memory controller 
210 is configured to process requests received from 
memory queue manager 206 and to correspondingly ac- 
cess locations within main memory 104. In one embod- 
iment, memory controller 210 supports synchronous 
DRAM, and is preferably implemented as a non-inter- 
leaved, non-parity, non-ECC memory controller. The 
memory controller timing may be programmable and 
may support address pipelining. Furthermore, the mem- 
ory controller 210 may support multiple physical banks 
of memory. Memory controller 210 may also be config- 
ured to support a variety of additional functions, such as 
paging support and refresh, as desired. 
[0070] Memory controller 210 services requests from 
memory queue manager 206 via read request queue 
220 and write request queue 222. For a write request, 
memory controller 210 takes data from a designated 
write request queue entry (e.g., the entry at the "head 
of queue") and generates an associated access to main 
memory 104. For a read request, memory controller 210 
retrieves data from main memory 1 04 and provides it for 
transfer to the requesting interlace. 
[0071] In one embodiment, memory controller 210 
services requests pending within read request queue 
220 and does not service requests in write request 
queue 222 until a predetermined plurality of write re- 
quests have become pending within write request 
queue 222. Specifically, memory queue manager 206 
may be configured to generate a control signal referred 
to as WrReqAlmostFull which, when asserted, indicates 
that the write request queue 222 is nearly full. When this 
control signal is not asserted, memory controller 210 
services requests from only read request queue 220, 
thereby providing a higher priority for read requests. 
When the WrReqAlmostFull signal is asserted, memory 
controller 210 begins to toggle between servicing a re- 
quest (or multiple requests) from the read request queue 
220 and then a request (or multiple requests) from write 
request queue 222 in a ping-pong fashion until the 
WrReqAlmostFull signal is deasserted In this manner, 
write requests are serviced to allow write request queue 
222 to receive additional memory write requests. In one 
embodiment, the WrReqAlmostFull signal is asserted 
when five pending requests reside in write request 
queue 222. 

[0072] Aspects relating to one embodiment of AGP in- 
terface 214 will next be discussed in conjunction with 
Fig. 5. In the depicted embodiment, AGP interface 214 
is configured to provide an external interface to a 



66-Mhz 32-bit AGP/PCI bus. Internally, AGP interface 
214 interfaces to memory queue manager 206, memory 
controller 210 and PCI/AGP queue manager 208. AGP 
interface 214 may be configured to support both AGP 

5 protocol transactions as well as PC I -protocol transac- 
tions (e.g., 66 Mhz PCI type transactions). 
[0073] As illustrated, AGP interface 214 includes an 
AGP slave interface 502 having an AGP interface slave 
control unit 504 coupled to an AGP slave transient read 

10 buffer 506, an AGP slave transient write buffer 508, an 
address decode and queue unit 510, and an AGP arbiter 
511 . AGP interface 21 4 further includes a PCI-mode in- 
terface 514 illustratively comprising a master module 
516 and a slave module 518. 

is [0074] Since in the illustrated embodiment, AGP bus 
110 is a shared resource tor both PCI protocol transac- 
tions and AGP protocol transactions, AGP arbiter 511 is 
provided to support the shared use of the bus by both 
protocols. Specifically, AGP arbiter 511 arbitrates be- 

20 tween agents requesting to perform PCI-mode Iransac- 
tions on AGP bus 110 and agents requesting AGP pro- 
tocol transactions. PCI-mode interface 514 is config- 
ured to support both master and slave functionality for 
PCI transactions on AGP bus 110, and can be config- 

2S ured similar to the PCI interface 216 discussed above 
in conjunction with Figs. 4A-4D. Like PCI interface 216, 
PCI-mode interface 514 may be configured to pass 
memory requests to memory queue manager 206 and^ 
NLM requests to PCI/AGP queue manager 208. In ad- 

30 dition, the PCI mode master interface runs cycles on the 
PCI/AGP bus on behalf of PCI write transactions target- 
ed to the PCI/AGP bus. 

[0075] For AGP transactions, when an AGP request 
is asserted on AGP bus 110, the address, command 

35 type and transfer length is received by slave interface 
502 via address decode and queue unit 510. As addi- 
tional requests are initiated by an external AGP master, 
each request is stacked up behind the previous request 
in the AGP slave address decode and queue unit 510. 

40 it is noted that when multiple requests are stacked up 
in the address decode and queue unit 510, the AGP re- 
quests may be retired out of order. 
[0076] An AGP write request is retired as the data is 
accepted into the AGP transient write buffer 508. AGP 

45 read requests are retired when read data is provided to 
the AGP bus 110 from the AGP transient read buffer 
506. In one embodiment, a total of up to four pending 
requests may reside in address decode and queue 51 0. 
It is contemplated however, that differing numbers of re- 

so quests may be queued within slave interface 602, as de- 
sired. 

[0077] As each address is stored in slave interface 
502, the AGP address will be decoded to determine 
whether graphics address translation is required. If the 
55 AGP address is within the bounds of the virtual graphics 
address range defined by the GART (Graphics Adapter 
Remap Table) mechanism (not shown), the AGP slave 
interface 502 indicates to the memory queue manager 
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206 that address translation is required for this request 
based on an entry in the graphics adapter remap table 
in main memory 104. It is noted that entries of the graph- 
ics adapter remap table may be cached within a sepa- 
rate GART cache module (also not shown) residing 
within bridge logic unit 102. 

[0078] As will be described in further detail below, 
while requests from AGP slave interface 502 are not 
snooped on CPU bus 106 (since the cycles are non- 
cachable), memory queue manager 206 may be config- 
ured to snoop all AGP read requests in write request 
queue 222 to main memory 104. This ensures that an 
AGP read request will be coherent with a previously is- 
sued AGP write request to the same address, where 
write data is still present in the write request queue 222. 
If an AGP read request hits an address present in the 
write request queue 222, memory queue manager 206 
flushes the write request queue 222 to main memory 
104 until the snoop hit condition is no longer present 
before issuing the read request to main memory 104. 
[0079] In one embodiment, AGP slave transient read 
buffer 506 includes a 32-by-32 bit transient read buffer 
for accepting up to four cache lines of read data from 
main memory 104 requested by an AGP master. Slave 
interface 502 requests read data from memory queue 
manager 206 in multiples of four, eight, twelve or sixteen 
quadwords (i.e., 1 , 2, 3 or 4 cache lines) based on the 
AGP requested address and transfer length. By provid- 
ing up to four cache lines of read data, the AGP slave 
interface can overlap AGP read requests to the memory 
queue manager 206 with read data transfers on AGP 
bus 110. 

[0080] Similarly, in one embodiment AGP slave tran- 
sient write buffer 506 comprises a 32-by-32 bit transient 
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data from an AGP master. By providing up to four cache 
lines of write data buffering, the AGP slave interlace 502 
can overlap the acceptance ofwrite data from an AGP 
master with write data transfers to the memory queue 
manager 206. When valid write data is present on the 
AGP bus 110, the data and byte enables are accepted 
into AGP slave transient write buffer 503. AGP interlace 
slave control unit 504 analyzes the amount of data 
stored in the AGP slave transient write buffer 508 to de- 
termine the transfer size to memory queue manager 
206. Data is transferred to the memory queue manager 
206 in multiples of four or eight quadwords (one or two 
cache lines) based on the AGP address and transfer 
length. 

[0081] Turning next to Fig. 6A, aspects of one suitable 
embodiment of memory queue manager 206 are next 
considered. As stated previously, read memory re- 
quests from CPU interface 204, PCI interface 216, and' 
AGP interface 214 are loaded into read request queue 
220, while memory write requests are loaded into write 
request queue 222. Corresponding write data is stored 
within a write data queue 602. The loading of read re- 
quests and write requests as well as various other func- 



tionality, as desired, is supported by control logic depict- 
ed generally as queue memory control unit 624. Various 
data paths 615 are provided between the request 
queues and the depicted device interfaces to accommo- 

5 date the routing of requests. As will be described in fur- 
ther detail below, a memory queue arbiter 626 is further 
provided within queue memory manager 206 to arbitrate 
between pending requests of CPU interface 204, PCI 
interface 216 and AGP interface 214. A write request 

io queue (WRQ) snoop logic unit 610 and a read request 
queue (RRQ) snoop logic unit 612 are further provided 
to maintain coherency as will also be discussed further 
below. 

[0082] In one specific implementation, write request 

is queue 222 is configured to store up to eight write re- 
quests concurrently. Each write request corresponds to 
tour quadwords (i.e., one cache line) of write data. Sep- 
arate portions of each ofthe eight locations of write re- 
quest queue 222 may be provided to store chip selects, 

20 bank selects and row addresses, and column address- 
es. By partitioning each request location of write request 
queue 222 in this manner, memory controller 210 may 
advantageously de-queue portions as it requires them. 
To facilitate snooping, in one implementation, write re- 

25 quest queue 222 is implemented as a register bank. 
[0083] Fig 6B illustrates various aspects associated 
with an exemplary implementation of write request 
queue 222 ; along with related aspects ofa write request 
queue snoop logic 610. Write request queue 222 is 

30 shown with a plurality ofregisters 650A-650D illustrative 
of various storage locations comprised within write re- 
quest queue 222. As noted previously, in one implemen- 
tation, a total of eight such storage locations may be pro- 
vided, although only four are included in the illustration 
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head of the queue (i.e., residing in register 750A) is serv- 
iced by memory controller 210, the remaining requests 
in the other registers are shifted one position to the right 
such that a new request will appear at the head of the 

•*o queue formed by register 750A. Memory queue arbiter 
626 arbitrates between pending requests in the various 
bridge interfaces and determines the next available stor- 
age register in which the write request may be placed. 
As illustrated in the drawing, memory queue arbiter 626 

•*5 may select either a CPU write request from CPU inter- 
face 101 or a PCI write request from PCI interface 216 
for loading into a register of the write request queue 222 
forming the current lail of queue. Memory queue arbiter 
626 may further select from requests from other inter- 

50 faces, such as AGP interlace 214 and others, such as 
a USB bus or an IEEE 1394 bus, if provided. Advance- 
ment of the requests Irom the tail of the queue to the 
head of the queue is controlled by portions ofthe func- 
tionality of queue memory control unit 624. Finally, the 

55 de-queueing of requests from the head of the queue is 
controlled by a memory arbiter 660. 
[0084] Fig. 6B finally illustrates various aspects relat- 
ing to a suitable implementation of portions of the write 
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request queue snoop logic unit 610. As illustrated in Fig. 
7B, a plurality of comparators 662A-662D are provided 
to compare the address of each valid request residing 
in registers 650A-650D with the address of a new read 
request which is provided to a register or port 664. Logic 
unit 666 generates a signal indicating whether there is 
a hit in any of the write request queue locations. Further 
aspects regarding the snooping operations associated 
with memory queue manager 206 will be discussed in 
further detail below. 

[0085] As stated previously memory controller 210 
normally services read requests pending within read re- 
quest queue 220 with a higher priority than write request 
pending within write request queue 222. Referring col- 
lectively to Figs. 6A and 6B : as long as the number of 
pending write requests within write request queue 222 
is below a threshold number, memory arbiter 660 will 
cause memory controller 210 to select only read re- 
quests from read request queue 220. When the number 
of write requests pending in the write request queue 222 
reaches a threshold number, write request queue 222 
asserts the write request almost full signal (WrReqAI- 
mostFull) to indicate that memory controller 210 should 
start servicing write requests. From that point, requests 
are serviced from both the write request queue 222 and 
read request queue 220 in a ping pong fashion until the 
write request almost full signal is deasserted. 
[0086] Write data queue 602 stores data associated 
with each write request In one implementation, write da- 
ta queue 602 can store up to eight caches lines of write 
data and byte enables. It is noted that data may be 
stored, in the write data queue 602 in a specific burst 
order (such as that of CPU 1 01 ) to thereby optimize per- 
formance. 

[0087] In one implementation, read request queue 
220 is configured to store up to four pending read re- 
quests from the various interfaces of the bus bridge. It 
is contemplated, however, that read request queue 222 
could be configured to store alternative numbers of 
pending requests depending upon the number ofoverall 
interlaces serviced by the memory queue manager 206 
and upon performance requirements. It is noted that like 
the write request queue 222. the request storage loca- 
tionsof read request queue 220 may be split into several 
sections, one for chip selects another for bank selects 
and row addresses, and the other for column addresses, 
request sizes and read destinations to allow memory 
controller 210 to selectively extract only a portion of a 
particular request as it needs the information. The des- 
tination information may be used by memory controller 
210 to determine whether to send data back through the 
memory queue manager 206 (for transactions requiring 
snooping)/ or to send the read data directly to the re- 
questing device (for non-snooping transactions). The 
physical structure of read requests queue 220 may be 
similar to that of write request queue 222 illustrated in 
Fig. 6B 

[0088] It is additionally noted that one or more read 



holding buffers may be included within queue memory 
manager 206 to hold read data from memory destined 
to a snooping interlace while the CPU snoop is effectu- 
ated. This allows a temporary location for read data from 

5 main memory 1 04 to reside until it is determined whether 
a snoop writeback occurs, in which case the writeback 
data is sent to the requesting interface. It also allows a 
temporary location for the writeback data which arrives 
before it can be delivered. 

10 [0089] In one implementation, memory queue arbiter 
626 receives a single request from each connected in- 
terface. It is noted that in one embodiment, AGP inter- 
face 214 may be treated as two separate interfaces for 
arbitration purposes, one for certain AGP mode re- 

75 quests and one for PCI mode requests. The request re- 
ceived from each interface may be a read request, a 
write request or some other request type, such as an 
unlock request associated with certain locked transac- 
tions, among others. Certain special requests may not 

20 be queued within either read request queue 220 or write 
request queue 222, depending upon the nature ofthe re- 
quest. For example, lock and unlock requests may not 
be provided to the queues. In addition, some requests 
may only be available from certain interfaces. It is further 

2$ noted that high priority AGP read requests, as well as 
requests to read the GART table from main memory 
104, may be treated by memory queue manager 206 
and memory controller 210 with a higher priority than all 
other incoming requests. To facilitate these high priority 

30 AGP related requests, additional arbitration and queue- 
ing mechanisms may be provided to arbitrate the high 
priority requests and queue the high priority requests for 
servicing by memory controller 21 0. These mechanisms 
may be implemented substantially independent ofthe 

35 depicted portions of memory queue manager 206. 
[0090] Requests are recognized by memory queue 
arbiter 626 and loaded into the appropriate request 
queue (i.e., either read request queue 220 or write re- 
quest queue 222) as long as there are empty slots in the 

40 queues. When all of a particular request queue's slots 
are filled, the requests are left pending and the interfac- 
es cannot issue more requests before their current ones 
are acknowledged. 

[0091] Memory queue arbiter 626 implements a 
■*5 round-robin priority scheme to allow fair access to mem- 
ory tor all interfaces. To implement the round-robin pri- 
ority scheme, memory queue arbiter 626 maintains a 
priority ranking to determine which devices gels serv- 
iced next, provided there are multiple requests pending. 
so |f there is only one request pending among the devices, 
that request is serviced immediately. When multiple re- 
quests are pending, they are serviced based on their 
priority rankings. The priority ranking is updated as long 
as a request is loaded into a request queue and an ac- 
55 knowledge is asserted to the requesting device. When 
there are no requests pending, the memory queue arbi- 
ter 626 parks at the CPU interface 204 to reduce the 
latency of initial CPU read cycles and resets the priority 
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scheme. In some circumstances, memory queue arbiter 
226 may select the CPU interface 204 or GART inter- 
face (not shown) out of order temporarily to handle 
snooping or AGP related address translation. Such out 
of order arbitration does not affect the saved priority 
rankings. Memory queue arbiter 626 controls multi- 
plexed data paths depicted generally as block61 5 which 
control which interface is connected to provide a mem- 
ory request to a given queue. Once an interlace is se- 
lected, logic embodied in queue memory control unit 
624 controls snooping and queue loading. 
[0092] Since in one implementation each interlace 
coupled to queue memory manager 206 can present on- 
ly a single request to the queue memory manager 206 
at a time, and since memory queue arbiter 226 imple- 
ments a fairness algorithm such as round-robin to arbi- 
trate among the requests, addition interlaces may easily 
be incorporated such as interfaces for a USB (Universal 
Serial Bus) and/or an IEEE 1 394 (FireWire) bus, among 
others, without significantly changing the design. Impor- 
tantly, devices on such added buses as well as devices 
on the PCI bus 114 and AGP bus 110, both isochronous 
and asynchronous, are provided fair access to main 
memory 104. 

[0093] Various aspects regarding the snooping of re- 
quests pending within read request queue 200 and write 
request queue 222 are next considered. In one imple- 
mentation, read requests from every device interface 
must snoop pending write addresses in write request 
queue 222. This write request queue snooping pre- 
serves ordering from the perspective of each interface; 
if a device writes and then reads the same address, it 
needs to receive that just-written data. If the write were 
in the write request queue 222 and ignored, the read 
may receive obsolete dat3 from main memory 104 
[0094] To complete the write and read request queue 
snoop quickly (e.g., in less than one clock cycle), write 
request queue snoop logic 610 and read request queue 
snoop logic 612 may be configured to compare only a 
subset of the addresses associated with the pending re- 
quests for snooping purposes. In one implementation, 
the snooping logic compares 14 bits of the addresses 
(e.g., bits 25: 1 1 of the system address). It is understood 
that the selection of the number of bits for address com- 
parison during the snooping operation is dependent up- 
on the speed at which the comparison operation must 
be performed and depending upon the acceptable tol- 
erance of perlormance degradation due to the in- 
creased possibility of false hits. 

[0095] The snooping of previous requests within write 
request queue 222 may be implemented in various 
ways. In one implementation, if a write request queue 
hit occurs relating to a new road request from a partic- 
ular interface, the read request is not acknowledged un- 
til a write request queue flush operation has occurred. 
All write operations prior to and including the write re- 
quest operation which resulted in the snoop hit is serv- 
iced by memory controller 210. In this manner, the pre- 
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vious write operation to the same address is forced to 
complete previous to the read operation, thus ensuring 
coherency. After the write request queue flush has be- 
gun, the queue memory control unit 224 can load the 

$ read request into the read request queue 220, and a 
CPU snoop command for the read operation (if neces- 
sary) may be provided to CPU interface 204 to issue a 
CPU snoop for the read transaction. 
[0096] In another implementation of the snooping of 

10 write request queue 222, a counter is associated with 
each location of read request queue 220. When a new 
read request is received by memory queue manager 
206, the address residing in each location of the write 
queue 222 is compared with the address of the new read 

*s request (or a certain subset of the address bits are com- 
pared, as discussed above). If a snoop hit occurs with 
respect to a particular entry in write request queue 222, 
a value indicating the location of that entry is stored in 
the counter associated with the location of read request 

20 queue 220 in which the new read request is loaded. The 
value thus indicates the depth of the hit in the write re- 
quest queue 222. Each time a write request is de- 
queued from write request queue 222, the counter value 
associated with the read request is decremented by 

25 one. The count values associated with other read re- 
quests which contain valid values indicating the depths 
of snoop hits in the write request queue are similarly 
decremented. As read requests are de-queued from 
read request queue 220 and requests at the tail of the 

30 queue are shifted towards the head of the queue, the 
count value associated with each read request is shift- 
ed, unmodified, along with the read request. If a partic- 
ular read request gets to the top of the queue with a 
count above zero, memory controller 21 0 will not service 

(kn r ^v^... A ^t . ;i ^^yj;4; nnr >i ~. ,«—..««4„ i 
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and the count value reaches zero. 
[0097] Read request queue snooping may be similar- 
ly performed when a write request is asserted by an in- 
terface. More specifically, to avoid situations wherein 

40 memory controller may write ahead of a read transaction 
to the same address, which may occur if the write re- 
quest almost full signal is asserted or another read is 
causing a write request queue flush, read request queue 
220 is snooped before a write is loaded into write re- 

4£ quest queue 702. This snooping may run while the write 
data is still being gathered. If a hit occurs, the read re- 
quest queue 220 is flushed until the hit condition goes 
away (i.e., the read request causing the hit is de- 
queued). Alternatively, a counter may be associated 

so with each write request queue entry to track the number 
of read requests which should be serviced prior to serv- 
icing the write (i.e., a count value indicating the depth of 
a hit in road request queue 220 may be maintained, sim- 
ilar to the above description of the snooping of write re- 

ss quest queue 222). 

[0098] After snooping of the read request queue 220 
occurs, memory queue manager 206 may further send 
a snoop request to the CPU interface 204. As stated pre- 



EP 0 924 620 A2 



BNSDOCID <EP 0924620A2 I > 



12 



1) 



23 

viously, memory queue arbiter 626 temporarily departs 
from its normal priority scheme and starts servicing the 
CPU interface until the snoop results are available. If a 
cache hit occurs, memory queue arbiter 626 remains at 
CPU interface 204 until writeback data is sent from 
cache memory to main memory 1 04. After the write back 
request completes, memory queue arbiter 626 returns 
to the requesting interface. Once the snoop is finished 
and the memory queue arbiter 626 has returned to the 
requesting device interface, it loads the write request 
into write request queue 222 and proceeds to handle 
other requests as needed. It is noted that writeback data 
could be merged with data associated with an incoming 
write request using the byte enables of the write request 
as a mask. It is similarly noted that for certain read re- 
quests, after snooping of write request queue 222 oc- 
curs, memory queue manager 206 may send a snoop 
request to the CPU interface 204. Writeback data cor- 
responding to a modified hit line may be snarfed and 
provided to the requesting interface prior lo storage of 
the writeback data into main memory 104 
[0099] Referring back to Fig. 2, aspects regarding one 
implementation of PCI/AGP queue manager 208 will 
next be considered. As stated previously, PCI/AGP 
queue manager 208 is responsible for controlling re- 
quests passed between CPU interface 204, PCI inter- 
face 216 and AGP interface 214 that are not targeted to 
local memory (i.e., main memory 104). 
[0100] Fig. 7 depicts a generalized block diagram of 
an embodiment of the PC I/ AGP queue manager 208. A 
CPU bus control unit 702 is shown coupled to a PCI bus 
control unit 704 and an AGP bus control unit 706. A PCI 
NLM arbiter 710 is shown as a portion of the functionality 
of PCI bus control unit 704, and an AGP NLM arbitrator 
71 2 is shown as a portion of the functionality of AGP bus 
control unit 706. 

[01 01] CPU bus control unit 702 is configured to route 
read and write requests from CPU interface 204 to a tar- 
geted device. Various additional address and control 
signals such as data acknowledges and retry signals 
may further be communicated back from a targeted de- 
vice to CPU interface 204 through CPU bus control unit 
702. In one embodiment, CPU bus control unit 702 does 
not support the pipelining of CPU cycles between differ- 
ent devices: however, CPU pipelining to a single device 
may be supported by CPU bus control unit 702. 
[0102] There are two types of retry signals that may 
be relurned from a target device. The first one, referred 
to as a "retry", may be asserted from either the PCI or 
AGP master interface on non-posted cycles from CPU 
101 , which indicates that the cycle was retried by a tar- 
get on either the PCI or AGP bus. In this case, CPU in- 
terface 204 snoop stalls the CPU bus 103 until the retry 
signal is asserted. In one embodiment, CPU interface 
204 always snoop stalls non-posted cycles so that in the 
event the target bus retries the cycle, CPU interface 204 
can exit from the snoop phase by instructing CPU 101 
to retry the cycle. 
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[01 03] The second retry type is referred to as "fast re- 
try", and can be asserted for two different reasons. The 
first case is a special case that involves PCI bus 114. If 
the FLUSH REQ_ signal is asserted, it is an indication 
5 from secondary bridge logic unit 116 (e.g., Fig. 1) that 
an ISA device might do some transfers across PCI bus 
114 to main memory 104. The FLUSHREQ_ signal is 
used to not only flush out any pending CPU to PCI cy- 
cles, but it will also cause the PCI master interlace 402 
io to assert a signal which causes all incoming CPU cycles 
targeted to PCI to be retried immediately once they enter 
their snoop phase. This prevents the CPU to PCI re- 
quest buffer from getting filled again. The PCI NLM fast 
retry signal may also be provided to PCI bus control unit 
is 704 to cause PCI NLM arbiter 7 1 0 to give priority to CPU 
bus control unit 702 in order to flush any CPU to PCI 
data. CPU interlace 204 may further be configured such 
that, in such cases, a signal is sent back to PCI interface 
21 6 indicating that incoming cycles on the CPU bus 1 03 
20 which were targeted lo PCI bus 114 were retried while 
the fast retry signal was asserted. This signal may be 
used to cause PCI master interface 402 to request own- 
ership of PCI bus 114 in anticipation that incoming cy- 
cles are going to need to be run on PCI bus 114. 
2£ [0104] The second case in which a fast retry signal 
may be asserted involves a coherency issue. Whenever 
a PCI (or PCI device connected to AGP 1 bus 110) re- 
quests a read from main memory 104 to read a flag set 
by CPU 101 indicating that a data transfer from the CPU 
30 to PCI (or AGP) has completed, any posted data from 
the CPU to PCI (or AGP) needs to be flushed to assure 
that the data transfer has actually completed. In this 
case, the PCI (or AGP) slave interface 410 asserts a 
fast retry signal when it detects that a PCI (or AGP) bus 
35 master has requested a read from memory. This pre- 
vents any more CPU cycles to PCI and AGP from being 
accepted by CPU interface 204, and may guarantee that 
there will be no snoop stalls run on CPU bus 1 03 for the 
CPU cycles that get retried. This may minimize the la- 
40 tency for getting snoop results back for the snoop cycle 
that will be run on CPU bus 103 (as a result of the mem- 
ory read request). For this reason, whenever CPU inter- 
face 204 detects assertion of the fast retry signal, it will 
retry all cycles that are targeted tor PCI bus 11 4 and PCI 
45 mode transfers on AGP bus 110. 

[0105] PCI bus control unit 704 includes PCI NLM ar- 
biter 710 which is configured to arbitrate between write 
and read requests lo PCI bus 114 from CPU interface 
204. It is noted that PCI NLM arbiter 710 may further be 
so configured to arbitrate requests from other buses, such 
as an IEEE 1394 bus or a USB bus, if connected. Once 
a device has won arbitration, PCI bus control unit 704 
passes various request information to PCI master inter- 
face control unit 402 such as address, byte enables, and 
55 other control information. PCI NLM arbiter 710 employs 
a round-robin arbitration scheme. In addition, in one em- 
bodiment, PCI NLM arbiter 710 is advantageously con- 
figured to park on the CPU interlace 204 any time there 
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are no requests pending from any other requesting de- 
vices. An arbitration cycle occurs whenever the PCI 
master interface returns an address acknowledge while 
an address request is active, or when the arbiter is 
parked on CPU interface 204 and a request from a de- 
vice other than CPU interface 204 is asserted. PCI NLM 
arbiter 710 may be configured to park on a winning de- 
vice to allow multiple sequential quadword transfers. 
Furthermore, PCI NLM arbiter 710 may further be con- 
figured to support locked cycles from the CPU which will 
park the arbiter to the CPU interface. Additionally, when 
a fast retry signal is asserted from PCI interface 216, 
PCI NLM arbiter 710 will park to CPU interface 204 in 
order to flush out all CPU to PCI requests. 
[0106] AGP bus control unit 706 is similarly provided 
to control requests to AGP interface 214. An AGP NLM 
arbiter 712 is configured to arbitrate between write and 
read requests from CPU interface 204, and write re- 
quests from PCI interface 216. It is noted that AGP NLM 
arbiter 712 may further be conligured to arbitrate re- 
quests of additional buses, if incorporated. When a de- 
vice has won arbitration, AGP bus control unit 706 pass- 
es the request to AGP interface 214, including address, 
byte enables, and other control information. 
[0107] Similar to PCI NLM arbiter 710, AGP NLM ar- 
biter 712 also employs a round-robin arbitration 
scheme, with parking on CPU interface 204 anytime 
there are no requests pending from any other requesting 
devices. AGP NLM arbiter 712 may further be config- 
ured to park on a particular requesting device during 
multiple sequential quadword transfers, and also sup- 
ports locked cycles from the CPU interface, which will 
park the APG NLM arbiter on the CPU interlace. If a fast 
retry signal is asserted by AGP slave interface 502, AGP 
NLM arbiter 712 wi!! park to CPU Interface 204 In order 
to flush out all CPU to AGP (PCI mode) requests. 
[0108] Fig. 8 is a block diagram illustrating aspects of 
one embodiment of CPU interface 204 in greater detail. 
Particularly, Fig. 8 illustrates aspects of a fetch mecha- 
nism for implementing the adaptive speculative read al- 
gorithm implemented by CPU interface 204. Circuit por- 
tions corresponding to those of Fig. 3 are numbered 
identically for simplicity and clarity. 
[0109] As illustrated in Fig. 8, in one embodiment CPU 
bus interface control unit 302 illustratively includes a 
fetch control unit 802 configured to control the genera- 
tion of speculative read requests. A speculative address 
register 804 is further shown coupled to CPU bus inter- 
face control unit 302 and read back buffei 306. 
[0110] Fig. 9 is a flow diagram illustrating functionality 
associated with CPU interface 204 including functional- 
ity associated with speculative fetches of data from main 
memory 1 04, as controlled by fetch control unit 802. Re- 
ferring collectively to Figs. 8 and 9, when a memory read 
request (step 902) is received from CPU 101 by in-order 
queue 304, a request to read to the line of data associ- 
ated with the request is provided to the CPU memory 
transient buffer 308 during either step 906 or 908 (if valid 
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data corresponding to the request does not already re- 
side in read back buffer 306). The fetch control unit 802 
determines if that requested line of data is sequential to 
the most recent previous memory read request provided 

5 to the CPU memory transient buffer 308 (step' 904). If 
the request is not sequential to the line of the previous 
line request, no further actions are taken by fetch control 
unit 302 (i.e., the request for the line ofdata encompass- 
ing the current request is provided to the CPU memory 

10 transient buffer 308) (step 906). If ; on the other hand, 
the request is sequential with respect to the most recent 
previous read request provided to the CPU memory 
transient buffer 308, fetch control unit 802 generates a 
speculative request to fetch the next line of data sequen- 

J5 tial to the current request (step 910). The original re- 
quest and the speculative request are provided to the 
CPU memory transient buffer 308. The fetch control unit 
802 continues to issue further speculative requests if 
CPU 101 initiates additional requests to sequential lines 

20 of data (steps 912 and 914). When CPU 101 requests 
data corresponding to a non-sequential line, a specula- 
tive request is not generated by fetch control unit 802. 
Instead, a request to fetch the line of data corresponding 
the non-sequential request is provided to the CPU mem- 

2S ory transient buffer 308 via in-order queue 304 (step 
916). 

[0111] It is noted that in one embodiment, speculative 
fetches are not effectuated over a 2K page boundary. 
This limits the size of the internal counter mechanism to 

30 thereby meet timing constraints. Similarly, two cacheline 
requests to main memory that cross a 2K page bound- 
ary may not be allowed. In addition, in one implementa- 
tion sequential reads must be full cacheline requests in 
order to invoke a speculative read; quadword requests 

35 vvlll result in a miss (I.e., be considered non -sequent la!) 
even if their addresses are sequential. 
[0112] Speculative address register 304 stores an ad- 
dress corresponding to speculative data within read 
back buffer 306. The address of speculative data within 

40 read back buffer 306 is maintained to allow snooping in 
the event write cycles are detected. If a write to main 
memory 104 matches an address stored in speculative 
address register 804, CPU bus interface control unit 302 
invalidates the corresponding speculative data in read 

J 5 back buffer 306. This may improve the speed of cache 
tine replacements by CPU 101, or prevent CPU 101 
from receiving stale data in the event the write cycle is 
initiated from another bus. 

[0113] The adaptive speculative read algorithm imple- 
so mented by CPU interface unit 204 may advantageously 
improve the hit rate associated with speculative memory 
fetch operations More specifically, speculative fetches 
arc not performed unless a history of sequential access- 
es are detected by determining that an immediately pre- 
ss ceding fetch to a contiguous line of memory had oc- 
curred. That is, speculative read data is prefetched only 
in response to receiving a sequential line request. Dur- 
ing non-sequential access patterns, the number of read 
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requests will follow a pattern of 1-1-1-1. Thus, inaccu- 
rate speculative fetching of data may be prevented. On 
the other hand, when sequential requests are repetitive- 
ly made, a read request pattern of 1-2-1-1-1 results, 
wherein speculative data is continuously fetched upon 
detection of the first sequential access until the sequen- 
tial accesses terminate. Better hit rates and efficiency 
may thereby be attained, and memory bandwidth and 
power may be conserved. 

[0114] It is noted that other specific implementations 
of various aspects of bridge logic unit 102 are possible. 
For example, additional request types may be support- 
ed by the bus interfaces, as desired, depending upon 
the overall requirements of the system. Furthermore, 
other specific implementations of the various bus inter- 
faces as well as of a memory queue manager and a non- 
local memory manager are possible. 
[0115] In addition, in another embodiment, the adap- 
tive speculative read functionality as described above 
may be performed in response to read requests of other 
devices. For example, a memory control apparatus 
could be configured to implement adaptive speculative 
reads as described above in response to memory read 
requests of a peripheral device residing on a PCI bus. 
[01 1 6] Numerous variations and modifications will be- 
come apparent to those skilled in the art once the above 
disclosure is fully appreciated. It is intended that the fol- 
lowing claims be interpreted to embrace all such varia- 
tions and modifications 



Claims 

1. A computer system comprising: 

a microprocessor; 

a keyboard operatively coupled to said micro- 
processor; 
a memory: and 

a memory control apparatus coupled to receive 
memory requests from said microprocessor 
and to control accesses to said memory, where- 
in said memory control apparatus is configured 
to fetch a single line of data from said memory 
in response to an initial read request and to 
fetch a pair of sequential lines of data from said 
memory in response to detecting a subsequent 
read request to a sequential line. 

2. The computer system as recited in claim 1 , wherein 
said memory control apparatus includes a CPU in- 
terface coupled to said microprocessor, wherein 
said CPU interface is configured to receive said 
memory requests from said microprocessor. 

3. The computer system as recited in claim 2, further 
comprising a memory queue manager coupled to 
said CPU interface, wherein said memory queue 



manager includes a read request queue configured 
to store read requests to said memory. 

4. The computer system as recited in claim 3, further 
s comprising a memory controller coupled to said 

read request queue and to said memory, wherein 
said memory controller is configured to de-queue 
said memory read requests from said read request 
queue and access said memory in response there- 
to to. 

5. The computer system as recited in claim 4, wherein 
said CPU interface includes a read back buffer cou- 
pled to said memory controller, wherein said read 

is back buffer is configured to store data correspond- 
ing to said initial read request returned from said 
memory. 

6. The computer system as recited in claim 5, wherein 
20 said CPU interface further includes a CPU bus in- 
terface control unit coupled to said read back buffer, 
wherein said CPU bus interface control unit is con- 
figured to generate a request to fetch said pair of 
sequential lines of data from said memory in re- 

2B sponse to detecting said subsequent read request 
to said sequential line following said initial read re- 
quest. 

7. The computer system as recited in claim 6, wherein 
30 said read back buffer is further configured to store 

said pair of sequential lines of data from said mem- 
ory. 

8. The computer system as recited in claim 7, further 
35 comprising a speculative address register coupled 

to said CPU bus interlace control unit, wherein said 
speculative address register is configured to store 
an address associated with a speculative line of da- 
ta stored in said read back buffer. 

40 

9. The computer system as recited in claim 8, wherein 
said CPU bus interface control unit is further con- 
figured to invalidate said line of speculative data in 
said read back buffer in response to detecting a 
write to said address stored in said speculative ad- 
dress register. 

10. The computer system as recited in claim 9, wherein 
said write is initiated by a device operatively cou- 

50 pied to said memory control apparatus. 

11. A computer system comprising: 

a microprocessor coupled to a CPU bus: 
55 at least one peripheral device operatively cou- 

pled to a peripheral bus wherein said at least 
one peripheral device includes a disk drive ap- 
paratus; 
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a memory coupled to a processor bus; and 
a bridge logic unit coupled to receive memory 
requests from said microprocessor and from 
said at least one peripheral device and config- 
ured to control accesses to said memory, 
wherein said bridge logic unit includes; 

a CPU interface coupled to receive re- 
quests from said microprocessor and hav- 
ing a memory fetch control unit configured 
to letch 

a single line of data from said memory in 
response to an initial read request from 
said microprocessor and to fetch a pair of 
sequential lines of data from said memory 
in response to detecting a subsequent read 
request by said microprocessor to a se- 
quential line. 

12. The computer system as reciled in claim 11, further 
comprising a memory queue manager coupled to 
said CPU interlace, wherein said memory queue 
manager includes a read request queue configured 
to store read requests to said memory. 

1 3. The computer system as recited in claim 1 2, further 
comprising a memory controller coupled to said 
read request queue and to said memory, wherein 
said memory controller is configured to de-queue 
said memory read requests from said read request 
queue and access said memory in response there- 
to. 

1 4. The computer system as recited in claim 1 3, where- 
in said CPU interface includes a read back buffer 
coupled to receive and store read data from said 
memory. 

1 5. The computer system as recited in claim 1 4, where- 
in said read back buffer is further configured to store 
said pair of sequential lines of data from said mem- 
ory. 

1 6. The computer system as recited in claim 1 5, where- 
in said memory fetch control unit is configured to 
fetch an additional single line of data from said 
memory in response to a non-sequential read re- 
quest. 

1 7. The computer system as recited in claim 1 6, where- 
in said CPU interface further includes a speculative 
address register configured to store an address of 
a speculative lino of data stored in said read back 
buffer. 

18. A method for speculatively prefetching data from a 
main memory ol a computer system comprising: 



a device initiating a memory read request; 
a memory control apparatus receiving said 
memory read request and responsively fetch- 
ing a single line of data from said main memory: 

5 said device initiating a subsequent read re- 

quest to a sequential line; and 
said memory control apparatus fetching a pair 
of sequential lines of data from said main mem- 
ory in response to detecting said subsequent 

10 read request to said sequential line. 

19. The method as recited in claim 18 further compris- 
ing storing said single line of data read from said 
main memory into a read back buffer. 

15 

20. The method as recited in claim 18, in which said 
memory control apparatus fetching another single 
line of data trom said main memory in response to 
receiving a non-sequential memory read request. 

20 

21. The method as recited in claim 19, further compris- 
ing storing said pair of sequential lines of data into 
said readback buffer. 

2S 22. A computer system comprising: 

a memory; 

a device configured to initiate memory requests 
to read data from said memory; 

30 a memory control apparatus coupled to receive 

said memory .requests from said device and to 
control accesses to said memory, wherein said 
memory control apparatus is configured to 
fetch a single line of data from said memory in 

so response to an initial read request and to fetch 

a pair of sequential lines of data from said mem- 
ory in response to detecting a subsequent read 
request to a sequential line: and 
a display operatively coupled to said memory 

40 control apparatus. 



BMSDCCIO: <EP 0924620A2 I 



16 



EP 0 924 620 A2 




s 

o 
ex. 

CL 



O 
CO 

m 




< 


s 




CO 


§> 


SI 










Main Memory 
104 


CO 
O 

T 


Bridge 
Logic 
Unit 

102 


Memory 
Bus 




o 




BNSDOC1D: <EP„ 0924620A2 1 > 



17 



EP 0 924 620 A2 



S 




BNSDOCID <EP 0924620A? I 



18 



1 



EP 0 924 620 A2 



Snoop Control Unit 
316 



204 



To Memory 
Queue 
Manager 
206 



In-Order 
Queue 

304 



CPU Bus 
Interface 
Control 

Unit 

302 



/■■V 



CPU to 
Memory 
Transient 
Buffer 308 



Read back/ 
Write back 
Buffer 306 



CPUtoNLM 
Transient 
Buffer 310 



307 



312 



To Memory 
Queue 
Manager 
206 



To PCI/AGP 
Queue 
Manager 
208 

To/From 
"* Memory 
Controller 
210 



To 

PCI/AGP 
Queue 
Manager 
208 



FIG. 3 



19 



BNSDOCID <EP 0924620A2 J. > 



EP 0 924 620 A2 



410 



Slave Interface 



PCI Slave 
Address 
Buffer 414 



PCI Slave 
Transient Read 
Buffer 416 



PCI Interface Slave 
Control Unit 



412 



PCI Slave 
Transient Write 
Buffer 418 



PCI Interface Master 
Control Unit 



402 



FIG. 4A 



BNSDOCID: <EP_ ,O92d620A2 .l > 



20 



4* 



EP 0 924 620 A2 



CO 

o 

Q 
< 



<p CO 
CL-2 tJ 

^ C Q) 



CM 
CM 



CO 
CO 

IS 

CO 

Q 



o 



ffi 8 

CO "o 



CN 

m 

CD 

Q 



co o 
^ <2.£2 
r SO 



O 
Q 



o 
to 



CO 
CM 



o 

CO 



ST 

CO 
CM 

COL- 



ST 

co 

CM 

coL 



00 

6 



T3 

I 

CO 



o 
Q_ 



si 

^ to 



CM 



co 

CM 



:ts co 
00 o 

CM 
CO U5 

X 



5 



CO 

3 

< 



CM 
CM 



o 

x> 

o 

Li. 



CM > 



A- 



CM 

5 



CO 



CO 

CO, 

IS 
co 

e 



o 



21 

BNSDOCID <EP .. 0924620A2 ) > 



I 



EP 0 924 620 A2 



CO. 

O 

Q 
< 



CM 
CM 




Q 



S9UJU3 nq frgxg 



BNSDOCID:<EP 0924620A2 I > 



22 



EP 0 924 620 A2 



^-214 

AGP Slave Interface ! 



Address Decode 
and Queue 

510 



AGPSIave 
Transient 
Buffer 

506 



AGP Interface Slave 
Control Unit 

504 



AGP Slave 
Transient 
Write Buffer 
508 



AGP 
Arbiter 

511 



To Memory 
Queue 

Manager 206, 

PCI/AGP 

Queue 

Manager 208, 
and Memory 
Controller 210 




PCI-mode Interface 



FIG. 5 



23 



J' 



EP 0 924 620 A2 



'9 



To CPU 



Interface 204 



To PCI - 
Interface 216 



To AGP - 
Interface 214 



To other 
Interfaces - 
(if provided) 



Data 
paths 
615 



Memory 
Queue 
Arbiter 

one 



WRQ 
Snoop 
Logic 
610 



Write Data Queue 
602 



Write Request 
Queue 

222 



Queue Memory 
Control Unit 
624 



Read Request Queue 
220 



RRQ Snoop Logic 
612 



Memory Queue Manager 206 n 



To 
Memory 
Controller 

210 



FIG. 6A 



BNSOOCID: <EP 0924620A2 I > 



24 




RNSDOClD. <EP 092 4620 A 2 I > 



25 



EP 0 924 620 A2 



CO 






AGP 
NLM 
Arbiter 


AGP Bus 
Control Unit 




CM 


< 




CO 








BNSDOCID <EP . 09246J0A2 I > 



26 



EP 0 924 620 A2 



204 



Snoop Control Unit 
316 



To Memory 
v Queue 
Manager 
206 



In-Order 
Queue 
304 



CPU Bus 
Interface 
Control Unit 
302 



Fetch 
Control 

Unit 

802 



CPU to Memory 
Transient Buffer 
308 



To Memory 
Queue 
Manager 
206 



Speculative 
Address Register 
804 



CPUtoNLM 
Transient Buffer 
310 



To PCI/AGP 
Queue 
Manager 
208 



Read Back Buffer 
306 



307 
312 



To 
Memory 
Controller 

210 



To PCI/AGP 
Queue 
Manager 
208 



FIG. 8 



27 



EP 0 924 620 A2 



Receive Memory Read 
Request 
902 




Fetch Requested Line 
Only 
906 




Fetch Requested Line 
908 



Speculating Fetch Next 
Line 
910 



No 




FIG. 9 



28 



(19) 



J 



(12) 



Europaisches Patentamt 
European Patent Office 
Office europeert des brevets 



(11) 



EP 0 924 620 A3 



EUROPEAN PATENT APPLICATION 



V 00 / 


Date of publication A3; 


f5*n intci 7 G06F 13/16 G06F 13/40 




28.06.2000 Bulletin 2000/26 






Date of publication A2; 






23.06.1999 Bulletin 1999/25 




\ c 1 / 


Application number: 98310413.4 




(22) 


Date of filing: 18.12.1998 




(84) 


■ - ■ 

Designated Contracting States: 


* ivieio, iviaria i_. 




AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


Houston, Texas 77070 (US) 




MC NL PT SE 


• Foster, Joseph E. 




Designated Extension States: 


Spring, Texas 77379 (US) 




AL LT LV MK RO SI 








(74) Representative: Brunner, Michael John 


(30) 


Priority: 22.12.1997 US 996310 


GILL JENNINGS & EVERY 




Broadgate House 


(71) 


Applicant: Compaq Computer Corporation 


7 Eldon Street 




Houston, Texas 77070-2698 (US) 


London EC2M 7LH (GB) 


(72) 


Inventors: 




• 


Maguire, David J. 






Spring, Texas 77379 (US) 





(54) Computer system including a bus bridge implementing adaptive speculative read operations 



(57) A computer system includes a microprocessor 
coupled to a main memory through a bridge logic unit. 
The bridge logic unit receives memory read requests 
from the microprocessor and provides the requests to 
the main memory. The bridge logic unit includes a mem- 
ory fetch control unit configured to fetch a single line of 
data from the main memory in response to an initial read 
request from the microprocessor. If a read request to a 
sequential line of data is received from the microproc- 
essor, the memory fetch control unit fetches not only the 
requested line of data but also the next sequential line 



of data. Thus, following the initial read request in which 
a single line of data is fetched, when the microprocessor 
issues a request for data from a sequential line, that line 
is fetched and the subsequent line is speculatively 
prefetched. If the microprocessor continues with a re- 
quest to yet an additional sequential line, the memory 
fetch unit continues its speculative generation of a re- 
quest for the next sequential line. If the microprocessor 
issues a memory read request to a non-sequential line 
of data, the memory fetch control unit fetches only that 
line of data. 



CO 
< 

o 

CM 

CM 

O) 

o 

CL 
LU 



^ 1C3 



Logic 
LWt 



But (PCi) y~ 1 



1 m I 



"PS" 



(AGP] 



Com ota 



Stcondsy 
EMga Logic 

IPcnphoni 



■ (ISA1HSA) 



Dseizy 



FIG. 1 



Prnted by Jouve. 75001 PAIllS {n» 



BNSCOCIO: <EP 092 4620 A 3 I > 



EP 0 924 620 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



AppBoatton Number 

EP 98 31 0413 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document wttti indication, where appropriate, 
of retevant passages 



Relevant 
to daJm 



CLASSIFICATION OF THE 
APPLICATION (IntCU) 



US 5 586 294 A (GOODWIN ET AL) 
17 December 1996 (1996-12-17) 

* column 3, line 25 - line 57 * 

* column 7, line 57 - column 11, line 17 * 

* figures 5-8 * 

"Method for "Smart Prefetch" of Data from 
Main Memory by a Memory Controller for 
Access by a CPU 11 

IBM TECHNICAL DISCLOSURE BULLETIN, 
vol. 38, no. 9, September 1995 (1995-09), 
pages 203-204, XP000540239 
New York, US 

* the whole document * 

WO 96 22571 A (INTEL CORPORATION) 
25 July 1996 (1996-07-25) 

* page 6, line 1 - page 12, line 7 * 

* figures 1-3 * 



1-22 



606F13/16 
606F13/40 



1,11,18, 
22 



1,11,18, 
22 



TECHNICAL FELD9 
SEARCHED (taLCW) 



G06F 



The present search report has been drawn up tar all claims 



THE HAGUE 



Dct» of oomptolon of tfw mfdh 

4 May 2000 



McDonagh, F 



CATEGORY OF CITED DOCUMENTS 

X : pftrtiaiarty relevant I taken alone 

Y : parocuany retevani I oomMneo wtfi another 

doament of the term category 
A : te(*notogtaal badtgnxn) 
O : notv-wrStan Jadoaun 
P : IrtermMtate document 



T : theory or prtnct>Je underr/tig the Invention 
E : Barter patent doamort, bii pubtahed on, or 

after the flkip, date 
D : document cfted In the appttcaflon 
L : document cfted lor other reaeone 

ft :mernbefOttneaame patent tamty, correapondng 
document 



BNSDOCID <EP_._ 092 4620 A 3 I > 



2 



EP 0 924 620 A3 



ANNEX TO THE EUROPEAN SEARCH REPORT 
ON EUROPEAN PATENT APPLICATION NO. 



EP 98 31 0413 



TMs annex lists the patent family members relating to the patent documents cted In the above-mentioned European search report 
The membere are as contained in the European Patent Office EDP Se on 

The European Patent Office is In no way liable for theae partaJars which are merely given for the purpose of ^formation. 

04-05-2000 



Patent document 
cfted m search report 



Publication 
date 



Patent famBy 
memberts) 



Publication 
date 



US 5586294 



17-12-1996 



NONE 



WO 9622571 



25-07-1996 



US 
AU 
DE 
EP 



5630094 A 
4699396 A 
69604564 D 
0804763 A 



13-05-1997 
07-08-1996 
11-11-1999 
05-11-1997 



i For more details about this annex :eee Official Journal of the European Patent Office, No. 12/82 



BNSDOCID <EP... 0924620A3J > 



3 



THIS PAGE BLANK (uspto) 



